J DOCOMBNT RESDHE 



ED 114 103 



IB 002 733 



AOTHOP 
TITLE 



institot:^on 

POB DATE 
NOTE , 



ifilliims^ Martha / 

The Impact of Machine-S^dable Data Bases on Library 
and Information Servic^^s, National' Program for 
Libraries and Information Services Related Paper 'No. 
26. 

National Commission on Libraries .an4 Information ^< 

Science^ Washington, D. ^C. -* / • 

Apr 75 ^ • ' 

35p.; For related docum'enits see ED 100 387-97; IR 002 

728-34 • . ^ ' 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MF-$0.76 HC-$1.95 Plus Postage 

Computers; *Data Bases; *Inf ormation Retrieval; 



Information Science; 
Information Storage; 
Technical Processes; 
Telecommunication 
*Nat ional Commission 
NCLIS ' ' 



♦Information Services; 
Library Seryices; Library 
National Programs; *Networks; 

* ( 

Libraries'^Inf ormation' Science; 



Abstract 

The growth and proliferation .of machine-^r^adable data 
bases have created a need to consider the nature of recent 
developments, the impact of those' developments oa library and 
information services^ and their relationship with ^he National^ 
Commission on Library and Information Science (NCLIS) program. Data 
*bases*m^y contain, information in a variety of fojrms^ may^be produced 
by government or private business ^ and may be discipline^ mission^ or' 
problem oriented, or ifiter^ or multi-disciplinary. The availability 
of such data bases *may cause changes in such library a.ctivities as 
journal acquisition and interlibrary loans; or libraries may provide 
*^arch services, . act as intermediaries in preparing searches, or 
r^j^efer people to aippropriate information services. The role of NCLIS ^ 
should be to support education, training, and research in the use of 
data base^, help expand service to new constituencies, encourage 
improvement of ret*rieval systems, promote the use of 
telecommunications, and provide a basis for networks and data base 
sharing in '/all sectors of the' information community. (LS) 
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The Impact of Machine-Readable Data Bases 
on Librayy and Information Services 



Introduction , « 



-The purpose of this paper- is: to outline developments in the aVea of 
machiR^^readable data bases; to assess the* impact of these developments on 
library and ififormatipn services ; and to relate thorn to the National Program for 
Library and Information Services that^ has been proposed by the National Com- 
mission on Libraries and Information Science. ^ 

First of all, one must know what a machine-readable data base is. Jt is 
an organized collection of information in maCThine-readable form. The collected 
information may be of several types: bibliographic, or bibliographic-related; 
natural language text; numerical; or representational. An example of a biblio- 
graphic data bake is the MARC' II data base of. the Library of Congress or The 
Chemical Abstracts Service's '(CAS) Condensates tapes. The CASIA (Chemical 
Abstracts Subject Index Alerts) tapQ^s, which contain subject index terms and 
postings ^that consist of Chemical Absjtracts citation numbers is a bibliographic- 
related data 'base because the citation number refers the user to other tapes 
or hard-copy sources^that contain the full bibliographic record for the cita- 
tion. A natural l^guage text data base would be the text portion of the 
New York Times Information Bank which contains not the full text of the articles 
from newspaper^ but textual summaries c^r abstract's of ^the articles. System 
50 for Stat^" Statutes of Aspen Systems Corporation, an example of a full text 

7^ • . 

data base, contains over 200 million words of statUtc law in the form of full 

r / ' 

text. /Examples of numeric data bases are ntimerous" but a familiar one would be 
the U.J S. cen-sus tapes containi ng 'current census data and produced by the' 
UnitcEl Sta»tcs Blireau of the Census. A data ba$e that contain^ 'not alphameric 
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data but grapliic or pictorial representations such as the CAS Registry ^ 
Structure; data bn^e, uhich contains chemical structures, is referred to "as a 
representational data base. One then can see that there are many types Of 
data bases, containing many types of information which is represented in many * 
ways. In this paper the term data base will refer to bibliographic 
^dat'a bases and no other types unless specifically indicated^ i 

The past decad-e has seen a considerable number of technological advances 
and cJevelopments in the machine-readable ^ata base afield *and these have had 
a' decided impact on the|types of s^^h services provided to users. Far more 
data bases exist today than at any time in the past, and far more users are 
receiving search services from machine-readable data bases than at any time 
in the past. Aside from advances in tha areas of computer technology, storage, 
and communications, the very simple fact trkit large numbers of machine-readable 
data bases, (corresponding to mewiy of the most heavily searched abstracting 
and indexing services) now exist and can be searched is significant. This 
was not the case ten years ago. We now have machine-readable data .bases in 
almost all of the major fields of science and technology, as well as data bases 
coverin'g news articles, legal cases and statutes, drug and poison information, 
etc. Efforts are underway and certainly more work is needed to generate data 
bases that would provide Community service type information such as consumer,' 
day-care, legal aid, reci-eational and leisure time activities information, etc . 
There are hundreds of valuable publicly available data bases and many more 
private data bases. The problem is making them knowyh, understood and used by 
researchers and the public at large.' 
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,Who produces or generates the data bases? They are produced both by 
governments^; sources and within the private sector. Included in the private 
sector are profit-making and not\f6r-prof it organizations such as V^o^ ^ssional 
societies. Although the government is responsible for the generation of , 
numerous data bases in many cases the actual production work is carried out 
under contract by not-for-prof i;t or commercial organizat;ions. 

Many of the largest and most heavily used data bases were produced by the 
federal government. Some examplos^of these are: The MI.HLARS (Medical Litera- 
ture Analysis and Retrieval System) tapes produced by the National Library of 
Medicine; the MARC II (Machine^Readabl e Cataloging) ' tapes produced by the 
Library of Congress; the ERIC (Educational Resources Information Center) stapes ^ 
of the National Institute of Education; the DDC Tapes (Defense Documentation 
Center tapes) of the Department of Defense's Defense Documentation Center; 
GRA (Government Research Announcements) of the National Technical Information 
Service (NTIS) and STAR (Scientific and Technical At^rospace Reports) tapes 
produced by the National Aeronautics and Space-Administration. .t)^e_fact that 
government generated data bases are heavily used /is a £unction not only of their 
usefulness t?ut also of the^^fc^t that their production and us6 are subsidized 
by th^ government* 

Many of tTve large scientific, technical, and discipline oriented d^ta 
bases have been produced by professional and technical societies in the not-for- 
profit part of the private .sector . Some of these are: the SPIN (Searchable 
Physics Information Notices) tapes of the American Institute of Physics; BA- 
Previews (Biol'ogical Abstracts Previews) 'of Biosciences Information Service; 
CA Condensates of Chemical Abstracts Service; PATELL (K^ychological Abstracts 
Tape Edition-Leased or Licensing) of thj? American Psychological Association; 
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CQMPENDEX (Computerized Engineering Index) of Engineering Index, Inc.; and 
METADEX (Metals Abstracts Index) of. the /\merican Society for Metals. These 
data bases are produced within the private sector, however, many of them have- 
had research and development funds from the goveOTment which helped" them to get 
started or to conduct research associated with systems or products. 

The number of for-profit organizations producing data bases is small bu 
some of the data bases are very important. For example: the Institute- f(^ 

Scientific Information publishes the Science Citation Index (SCI) tapeS and the 

/ 

/ 

Social Science Citation Index (SSCI) tapes; Excerpta Medica is produced by the 
Excerpta Medica Foundation; the F 5 S Index of Corporations and In^stries is 
produced b>c-^2redicasts , Inc.; and the New York Times Informatioiy Bank is pro 
duced by the New York Times. 

These are but a few bf the machine-readable data bases An use today. They 
are generated by government, for-pro'fit and not-for-profit organizations ^nd 
thetr -orientation may vary. They may be discipline oriented, mission oriented, 
problem oriented, inter-disciplinary or multi-discipliiiary . There are many of 
them and thy level of use is rising rapidly. The da/a bases- may ^e processed by 
centers that provide services directly to users, or services may be provided 
indirectly through brokers or service centers. T/he searches may be conducted 
on-line or in the batch mode. 

Data base searching has a direct impact oh libraries in several ways; it 
miiy affect the acquisitior? policy of the 1 ibrAry--either increasing' op decreasing 
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acquisitions by either pointing out the non-Aise of some journals or the need for^ 
other\ journals; it may affect the interlibrkry loan traffic of the library as 
^either \i borrowing organi2^tltion or as a le^i'ding organization--depending on tfie' 
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correspondence between the library's serials anci monograph collections, and the 

f 

retfievcd citations from data base searches; the library may expand ol^ deepen 
its services by off^ering data base, search services from duta bases it processes; 
it may offer data base services to its clientele by functioning as an inter- 
mediary preparing searcli questions and processing them via an on-line service 
ot through another center; or the library may function as a referral center 
directing #its customers to the appropriate da,ta bases and service centers. 

Data bases relate to' the National Program not only with respect to the 
service aspect, which bears directly on individual libraries, but also in areas 
where NCL4-S has specif ically expressed -^concern . There is a need for training - 
programs to prepare librarians and information scientists to work with data 
bases. There is a need for continued federally stipported information science 
and communication science researcli which would affect data base use. There is 
a need t'> expand data base content to serve new constituencies. There is a need 
to recognize and cooperate with the private sector in the generation and use 
of data bases.. There is a need for development of resource locators and docu- 
ment delivery systems for closing the information retrieval (data base retrieval) 
loop. There is a need for working towards a reduced rate for telecommunications, 
(for information^^j^iiatTsfer) in. order to promote and expand data base use and 
provide service to a wider range of users. And, above all, there is a need for 
data base sharing via networks--including^ all sectors of the information 
community. 
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Data Base Origins and Gcnqration 

Why were so many machine-readable data bases produced in the late 19601^ 
and early 1970* s? A data llase<cxists once a fi'le has been converted to 
machine-readable form, A flw data bases were generated specifically for the 
purpose of information retrieval, but because the cost of data input is high 
and could seldom be justified for purposes of retrieval alone many more data 
bases were created as by-p'toducts of other activities. , Some were created^ 
because machine-readable data was needed as a component of a computerized 
process control or productioh system for publishing primary journals, indexes 
or abstracting journals. Others were created as a result of the fact that 
computerized typesetting was used to 'produce a hard-copy publication. Com- 
puters j^ave proved to be economic and effective toolL for producing primary 
and secondary pub,lications. Consequently, every time a publisher uses photo- 
^composition a potentially machine-searchable file exists.. The machine-readable 
file, once created, can* be automatically reorganized, merged with other 
machine-readable files, reformatted, and repackaged to meet the demands of 
various markets. It is obvious that machine-readable files are considerably 
more flexible and can serve many more functions than can hard-copy records. 

The by-product aspect of machine-readable data bases is no longer, the 
raison d^ Otre for many of today's major data bases. Many publishers or organi- 
zations in the business of information handling have adopted a "data base 
approach" to management of their processing systems , and distribution of their 
information files. In such organizations the data base management system , 
impacts all the information functions of the organizational-abstracting, 
editing, indexing, authority files, production scKeduUs, sequencing of tasl^s 
etc. --through to the composition and distribution of products. v/hcthcr printed. 
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microform, or machine-readable. -."The data base approach, [which has been adopted 
by many publ isherj) asserts that there exists, for each enterprise, an accumu- 
lation of information that is iHvotal to its operation. This concept implies 
that .the description and treatment of such a col lection^ should not be oriented 
toward specific processes but should be determined by the value and character 
of the information it^e4^ An integmed data base usually means an organfzed 
collection of computer-readable jlnformatron in which the information about 
each entity is recorded once in standardized form, and all access to that m- 
• formation Is aclvieved through indexes and cross-references to the basic record 
and the authority files that support it.*' (Ref . No.l) An integrated data base 
management system then, requires definition, design and standardization of the 
data elements that comprise the files. 



Technical Aspects- -Data Base Forpiats and Rile Structures 

Before discussing data base searching and^search services, a few distinc- . 
tions are in order. It is helpful to understand the makeup of a data base in 
terms of the records and data elements that comprise it; the difference between 
the' distribution format of a data base ag it is provided by the producer and' 

♦ 

•the format of a data base as the processor has structured it for searching; 

^ the meaning of batch, on-Hne and Anteractwe; and, of course, the difference 
between retrospective- and Current awareness searching. Data elements, are the ^ 
basic building blocks of data bases. In the case of bibliographic data bases 
spx^c of the generic nanleS'of the elements may be author, title, journal name, 
volume number, issue number, date of publication, index term, keyword, abstract, 

/.publisher name, etc. The data elements are the smallest units or elemenis 
that comprise the records (in this, case bibliographic records) which, in turn. 
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comprise theHfilc. A searcher is permitted to access individual records witliin 

the fi'le or individual elements within the records. Thus, one can require 

« 

specific index terms in a search question; the computer searche"s the index 
partion of the records in the data base to locate term mc^tches and tlien pro- 
duces the records that match the question. On the otli^r hand, if the searcher 

^knows the citation or reference number of certain desired records he may 
specify these and the records will be printed out oi displayed. 

It is possible td search specified data elex^nts witKin an entire record 
either because the elements are identified or tagged with unique codes, or be- 
cause the position of an element within a record may specify the type of ele- 
ment it is. Often a directory is ^associated with each record and it specifies 
the elements that are present, their location in the record, and the length' ^ 
(number of alphanumeric characters) of the data content of the data element.' 
The standard arrangement of data element tags, d^ta content, and directory 
information for the records is referred to as the format of the record, and 
the arrangement of the records on a tape or other media is referred to as the 
file structure or file format. , • ' 

Unfortunately, file formats and record formats are not standardized nor 
are the» definitions, contents, and representations of the data elements. 
Librarians tend to define data elements functionally as seen in the MARC TI 
foVmat of the Library of Congress, while information scientists generally de- 
f^nq element,s on the basis of content. Th9re are almost as many data base 
formats as there are data- bases, whic»i leads to some confusion an4*of course, 

• added expense in processing tapes because it requires the processor of multiple 
tapes to either dtevelop multiple' search programs , or to reformat all incoming 
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^ tapes ^0 a standard format. One important standard has been developed by the 
AmericaV'National Standards Institute Inc. (ANSI) for interchanga (transrtfittal)' 

\ ■ 

of bibl ^graphic records — the "American National Standard for Bibliographic 
Information Interchange on Magnetic Tape." The MARC implementation of this 
standard has been propos'ed as a Federal fnformation Processing Standard (FIPS)^ 
and^barring problems, it will go into effect as a federal standard in 
August 1975. This standard deals only with the format for records on tape 
or the generalized structure but not the contents of the records. It does not 
define data elements or tags, specify required data elements, or specify data 
representation beyond that of the required character set. 

Most data bases are distributed by their producers to processors in the 
form of sequentially arranged records on magnetic tape. The processors may 
either search the file in the distribution format or they may reformat the 
(Uita base and store it on tape, disc or other media for searching. There are. 
conceptually two basic structures for searchable f i les--scquent ial and inverted. , 
{In i.ict, there are other forms or physical representations of the basic structures 
e.g., direct access or index-sequential. There are also mimy w;iys of u^^ing several 
tures for different parts of the same dat^J^ase.) In a sequential file records 
are arranged in sequenq'e with all of the qlemonts for a given record retained 
in one place and identified by the record ID. In an inverted file the search- 
able data elements are sorted with all postings (record ID's) that pertain to 

a given entry (e.g., an index term or title term) associated with that entry. 

/ 

Thus, for example, a^ ID's for records containing the term "CANC12R" would be 
posted .to that teri|(. When an inverted file is used for searching, certain 
designated element^s^may be inverted while other elements remain with the full 
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bibliographic record file which is used for pfckiucing the? output. ^Elements 

'such as author names, patent numbers, and <key words are useful search terms 

and 'so are inverted. But other, elements are not ifiverted because they would 

be of little or no value as search- terms. Page numbers, for example, are not 

inverted as it is unfjkely a user would ask for all articles that begin on 

page 422. ' . . . " 

An on-1 ine ^ystem is. one in yvhich the user--through a terminal — is in ^ 

direct communication with'the central processing unit of the computer. An 

interactive system is one in which there is literally an interactive two-way 

communication between the user and the machine, and the time 'for response by 

the machine is, or should be, ' immediate. On-line sear9hes of bibliograpliic 

data bases are^usually run against inverted dictionary-type files. A.'batch . 

processing system onjthe other hand, is one in which multiple jobs or'^search 

questions are' **batched" together and run at .one" time-. The search .questions 

may or may not be entered via a terminal but they are saved until the time of 

the batch run-. Searches ^against a serially or sequentially arranged file are 

t 

usually run in the batch mode because the basic cost of spinning the tape once 
can be spread over several sccifch questions rather than requiring one question 
totJjpar the total cost. There is, of course, some incremental cost for pro- 
cessing tKe additional questions. 

Retrospective and cUrrcnt awareness searches differ with respect to the 
curre'ntne^s of the files against which they are processed', and with respect 
ko the number, of times the question is run against the fil-es. A retrospective 
question is one which. is run against older, historical or past files, whereas 
a current awareness search is run against only the curr'ent or, most recent file 

r 
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A retrospective question is usually run once against a co/lection of 

many data base issues or .volumes, while a current awareness profile is run 

many times--each time against h different issue of the data base. Computerized 

current awareness systems are usually called SUi (selective dissemination 

of information) systems. Information is searched for and selected from the 

file in- accordance with the users search profile. The output or search results 

are disseminated' to the user(s). In the case of SDI, once.a profile of the 

users interesjts is developed and refined, it is run on a regula-r basis against 

new issues of the data base(s) requested by the user. SDI searches -are usually 

run in the batch mode^against sequential files. After the SDI run is completed,^ 

the tcy^e is added to the retrospective file for the appropriate data base. 

Several of the on- I Lneservicps now of^er SDI in addition to retro-searching. 

Since they have to process the incoming new data base issues as they arrive 

in order to add t^iem to the retrospective files, they can conduct the SDI searches 

at the time of th^t initial processing. In these cases search output can be 

disseminated to the userVthrough the mail or stored 'for later retrieval through 

his terminal. Retrospective questions may be run in either the batch or on-line 

modes depending oh the system* where the question is processed. .In most cases 

Xhe file that is searched is in inverted form for fast searching. ^ 

» 

SDI and retrospective searches differ in purpose. Tlie purpose of a retro- 
searclTmay be to provide the user with: a) a few relevant references to become 
acquainted with a topic; b) a thorough coveragje of the literature on a parti- . 
cular subjectr^'oinrS-one or more references that contain the answer to a specific 
questi-on. Thqse searches are conducted on demand and in "past*' or retrospec- 
tive files. The completeness of the search question processed against the file 
varies consideraBTy with the users purpose. In contrast, SDI' searches are 
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conducted in order to keep the user up to date with the published literature 
in his field. Thp user profile is usually designed to be as complete as 
possible and to achieve high recall. The same profile is used over and over 
against new issues of the data base, ''the profile is, 'of course, modified 
over the course of a year if^^changes in user interests or data base output 
indicate the need. Since SDI and retrospective searclies of data bases .diffei* ^ 
in purpose, comparisons of the twb with respect to performance ai)d cost^ m^ke 
little sense. They are different services.' 

Data Base Characteristics^-Criteria for Use ' - ' 

Subject Content ^ 

There are many different data bases and their diffei^ences can be* described 

in terms of their characterisitics . It is on the basis of various combinations 

u ' ' ' 

of characteristics that a user or center decides to search or offer services 

from a given data base. ^Certainly the -first and most important diffei^ence is 

that associated with the subject matter covered by the data base. A data base 

with appropriate coverage is needed to effect a proper match between the user^ 

question and the data base to be s^arche'd. As described earlier, data bases are 

discipline oriented, missioA oriented, problem oriented, or even multi-diciplinary 

or inter-disciplinary 'in character. Examples of disciplinary data bases are 

CA (Chemical Abstracts) Condensates , Polymer Science and Technology ' (POST), PATELL 

(Psychological Abstracts Tape Edition Leased or Licensed) and MEDLARS (Medical 

s 

Literature Analysis and Retrieval System). Examples of mission ■ oriented data 
bases are the Nuclear Science Abstracts data base produced by the Atomic Energy 
Commission and the STAR (Scientific and Technical Aerospace Reports) data base 
of the National Aeronautics and Space Administration. Problem oriented data 
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bases are HI;CP (Abstjracts on Health Effocts of Environmental Pollutants) and 

• / 

PIP' (Pollution information Project), a data bas'b prebared by the iNational 
Science Library of Canada using input that is seledted from several commercially 
available data bases. An inter-disciplinary'data hkse wou'ld be CBAC (Chejnical 
and Biological Activities) and two multi-disoiplir|ary data bases are the 
Institute for Scientific Information's Science Citation Index (SCI) covering 
virtually all areas of science and technology, ami the MARC (Machine-Readable 
Cataloging) tapes of the Library of Congress covering most of the monographic 
literature processed by the Library of Congress regardless of subject content. 

Other characteristics of data bases that affect the qua-lity, timeliness 
and/ thoroughness of search results and the cost of processing are: (a) the , 
type of source material included ^ (journal articles, monographs, reports, theses, 
-government literature, critical reviews, book^review^, newspaper articles, 
patents, etc.);" (b) completeness of coverage (cover-to-cover, selected articles, 
selected issues of a journal, ''all'* versus "selected items" of any. tyj^e) ; 
(c) lapse time between the appearance of an item in »the primary source, the 
secondary so'Urce, and the machine-readable data base (note that the machine- 
-readable product: may precede the printed secondary source); (d) indexing^ajjisi - 
coding practices employed (free language keywords, controlled and uncontrolled 
index terms, author titles versus augmented or edited titles, codes to indicate 
subject matter, types or'^asses o'f any sort, etc.))^J^e) availability of ab- 
stracts, extracts or text on the data base for search and/or display; (f) data 
elements included for search (access points) and/or display (author, title, 
journal references, index terms, codes, cited references, etc.); (g) size and 
growth rate .(How many records or references are contained in the file from the 
first year of the dfita base through the" last completed year? JVhat is the size 
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of an average record in termsr of number of characters? And, what is the 
percentage growth. rate<pf the data base per year?); (h) frequency of issue or 
update (how often are new issues of the data base produced? --weekly, semi- 
monthly, monthly, bimonthly, monthly, quarterly, annually, etc.)- 

Another consideration is the data base's correspondence with hard-copy 
publications. Some data bases have a l.:l correspondence with printed products, 
i.e., every record contained in the machine-readable form exists in the hard- 
copy counterpart. Some include all of the same references but without the 

abstracts. In some cases the data base is a subset versijbn of the hard-copy form 

/ 

and in other cases the reverse is true. A few data basefi exis't only in 
macliine-i^adable form hence,, it is not possible to checj search results against 
a hatd-co.2^ ' * 

Data Base Search Servicel^ / 

What .services are provided from data bases and who provides them? 'The 
services most often provided are ^DI and retrospective searches in either the 
on-line or batch mode. An addjjtional data base service offered by a few organi- 
zations is a private library service. A service that is related to data b^se 
searching but is, lamentably, seldbm offered by the centers that process data 
bases is that of document delivery. 

A private library service is one that permits the user (individual or 
company) to^crbate his own machine-readable »file either by designating that 
output from his SDI runs be stored on his own tape or disc file, or by speci- j 
.-fying oth.er records (e.g., company reports or items selected via his own ^ 
library seafrches) he would like to have entered* into his file. He may have 
his SDI output $avcd for several weeks or months until he wants to looTc at it 
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■and then he* can indicate which of the records should be retained for his sub- 
sequent use. The advantage of thej system lies in the fact that every record 
in the users file represents his own judgement. The file is personally tailored 
and it is under his control. 

^losing the information retrieval loOp requires delivery ^o the user of 
relevant documents identified throifgh a data base search. In jnany t^ys it seems 
that the search itself is the, easier or at least less time-consuming part of 
the information retrieval task. All to^ often a searcher completes a successful 
search only tt) be'^ustratcd by the inability to readily obtain'needed docu- 
m<ints. Jhe process of document delivery includes two major components: the 
identification of the source location of the documejit, and acqu»isition of the 
document. Delays associated with either or both occur often. One may ask, 
what are or could be the roles and responsibilities of the A f? I (abstracting 
and indexing) services (the major data basfe proHucers), information centers, 
and libraries in the, document delivery process? Very f^w data base processing 
centers handle document acquisition. Generally it is left. to the user to go 
to his libr3ry to obtain copies or inter-library loan use of a document. Closer 
ties^ between the or^ani'^at4.yji that processes the data base and the library thaf 



locate? and orders thq doctS^cnt might .simplify the p3joblem.' This hsts been done 
at Ohio State University's Mechanized^^ Information Center where document requisi- 
tions -eure handled by the center. What tha user really needs is not nec\ssarily 
to be able? to acquire the document immediatel^y following completion of a cJata 
base search, he needs to be able to by-pass all t^e intervening activities, ^time 
lags,* etc. involved in locating the source and ordering the aocument. During 
1975 two data base prod^ucers, ISI and NTiS, simplified the document adquisiti^h 
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process for documents cited in their own data bases. In the case of ISI, on- 
line users of their data ba^ through Lockheed or, Systems Development Corporation 
(SDC) are now able to use thfe accession number to order full text copies of 
relevant items through ISTs Original Article Tear Sheet service (bATS) . A 
specific command is provided for the searcher to order copies of desired items 
directly on-line through the system. TJie article qrder is subsequently trans- 
mitted to ISI headquarters in Philadelphia where the orders are filled. Ordered 
items are mailed to the requestor within 24 hours^ A similar capability is 
available for on-line ordering of NTIS documents. 

Possibilities for solving the. document delivery problem include: (1) the 
data base producer's maintaining copies of all documents cited in^Jrirs^ata base, 
as done by ISI and NTIS; (2)- the developing for on-line searching of one or 
several union lists of serials with holdings information: ' Lists could be pre- 
pared on a* national, state, gr regional, basis; (3). a national serials resources- 
center functioning as a /:entral' depository . Any one of these solutions would 

^ji^pHfy the problem of knowing where to find the document. The subsequent 
• • ' . • » 

ordering can be simplified because we do in fact have a nation-wide communication 
network. The actual reproduction and delivery of the document is d separate 
problem, isrs solution via use of tear sheet copies is certainly a good one 
and. takes care of the copyright problem. The more common solution--the use of^ 
copyipg equipment--appears to beTtte easiest way of producing copies^ thougH the 
legality still remains ambiguous. Facsimile transmission is used in cases where 
fast delivery is mandatory but this technique is still very expensive. However, 
if thc^j^f^ional Commission were able to obtain lower communication rates for . 
inform^tipn transmission the expense would be reduced. The most inexpensive 
means of transmitting copies is still the U. S. Mail Service. 
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pa^ decade. A number of factors haye^^led to t*he development of SDI : in- 
creased availability of computers; the automatic geaerati^on of data bases through, 
computer typesetting; expansion of the worlds 'literature; and the increasing 
cost of labor-intensive information serviis^esv'* (Ref. No. 2) SDI has also been 
the primary use to which data bases were pu\ itj the early years of data base 
development. The' reason that computer searchingvof retrospective data bases-did 
not coipe into its own until the middle 1970s' is iVrgely because only a few data 
bases with a sufficierit number pf years worth of material to make retro-searching 



V 

"SDI is on6 of the most successful information services 4€^_veloped in the 



worthwhile existed 



1. -A weekly or monthly SDI servic^.J^Om a data base during^ 
its first year is Useful, but^ a search of a S-month or lOvmonth collection of a 
data base is not very useful for retrospective search purposes. The situation 
bas charrged in thelpast year or two.. The use of 'on-line seL-ch services for 
^retrospective searching^aS grown by leaps and bounds. The number of on-line 
retrosearches conducted in 1974 has been estimated to be 7Q0,00Q (e^Jcluding . , 
library f^unction uses of cataloging data files-, e.g., OCLC) and the figureyf?" 
likely to be 1,000,000 in 1975. (Ref. No. 3) Reasons for this fast . take-hold of 
on-line services are: user familiarity with other types of on-line systems such 
as airline , space location ^systems ; the availability of a. sufficient number of 
yea.rs worth of data base cumulations to make retrospective searching us^fu^; 
the relatively low cOst of on-line searches; and, user familiarity of data bases 
via prior use of SDI services. • ^ * • 

Dat^ base search services may be used directly by the end-user or indirectly 
through ccfiters or brokers. The use of the term "center" refers -to organizations 
that acquire and process' data bases themselves and provide services to users 
who may be within their own organization or outside of it. .The term "broker" 
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is' 

'4 . > 

refers to a person or organization that searches data bases on-line at" another 
location or purchases data base searches from another center, for its o\>m 
customers. *The broker ^docs not process the -data bases but does provide search 
services -from them. Obviously, the use of a center or broker involves some 
additional cost. The added cost for the middleman must then be* offset by the 
added, value provided by the middleman. The added value may consist of: 

. augmentation, analysis, screening or interpretation of output 
. know-how in effectively u?ing other search services 
. knowing where to go to find the appropriate service or data base 
> ' r document location aftd delivery ' 

— - --r-reduction of the purchasers need for additional personnel with 

■ - specific skills where the need for such_ skills may be sporadic 

reduction of the purchasers need for equipment, e.g., terminals, ^ 
readers,- etc. where the frequency of use is not sufficient: to 
support the equipment *^ 
Data base processing centers may be independent -commercial organizations, they 
may exist in computer centers, libraries or information centers of various 
sorts. More often than not, the ^processing of data bases has Been done outside 
of the library setting. The brokerage situation though is different, because 
the brokerage organization needs little investment for equipment and has no need 
for programmers and computer specialists.. Reference librarians or informat;onr 
specialists hired by libraries- or information scientists operating independently 
can establish a search service fdr users with little capitol investment. They 
can effectively function as intermediaries between users and the systems. 
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Intermediaries 

The computer is a tool that can greatly assist and speed up human activities 
but it jis not a substitute fox intellectual activities. In information retrieval 
the intellectual aspects of searching remain the prerogative of the searcher 
while the repetitive^ routine and non- Intel lectual tasks are handlect quickly andr 
effectively by the computer. 

Centers that operate computer-based information services as well as organi- 
zations that make use of on-1 ine information services, in general, provide an ^ 
intermediary between a user and the computer-based system. The intermediary 
may be an information specialist, information scientist or reference librarian. 
He or. she handles the inteuWtual tasKs of:, selecting the appropriate system 
and data base(s) for the us^«|s question; negotiating the search question with 
the user; developing the quer|^ or profile with an -effective search strategy; 
conducting the search; and possibly evaluating the output^ Additionally, the 
intermediary can not only maintain familiarity and expertise with systems and 
,their various features, data bases and vocabularies but he/she can keep up with 
the changes that are made in sy/^tems, dat-a bases and vocabularies. 

Beyond the benefits that accrue to the use of intermediaries for handling , 
data base searches, there are advantages associated with centralizing data ''base 
search activities within an organ^ization. Advantages .include: (1) us(^or 
know^cdgablo intermediaries for effective searching; (2) , minimi zing the number 
of personnel needed for data base searching; C^) di^ribution of the search 
personnel costs over a wider base, the development iind (ise of one system for 
record keeping associated with searches; (4) develdping in one location the 
personnel with the ability to negotiate contracts f&r data base activities; and 
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(5) niinimi2.ing. the number of contracts negotiatecr by the organization. An 

intermediary may function as a searcher or in J r^^ferral capacity. In either 

i case he/she must be' aware of a multiplicity sources and services and this .is 
not a simple task. 

Data Bases and the 'National Program 

ThQre are many ways^in which thye National Program relates .to data bases anc] 
th)2se are ili areas where NCLIS has/ expressed concern syich as: education and 
training; research; constituenc/es served; generation and use of data bases 
within the private sector ;^esource locators; telecommunications;- and network 
resource sharing. \ j • * * ^ . * 

Education and Training 



In line with NCLIS opjeq^tive to ''Develop and continually educate the human 
resources required to imiyiement a National^ Program..." (Ref. No. 4, p. 60) there 
is a real need for both /basic and continuing education to train- personnel to 
become skilled in the processing and searching of machine-readable data bases. 
Courses dealing wi^h d^ta base preparation, processing and use as well as courses^ 
in center operation and managpment are Vieeded in the library and -information , - 
science schdols. There is a crying need fo? information specialists trained in 
modern techniques for information retrieval. ''The spread of... modem methods of 
retrieving information is not a disease-like process but rather a force that 
makes it possible for the scientist to become society's eyes and ears--its overt 
'intelligence service." (Ref. No. 5) Scientists and the. general public who have 
information needs must be willing and able to definxj what they need to know. 
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Likewise, information specialists^ brokers, and information service librarikns . 

must know what resources (whether hardcopy, . machine-readable or micrographic in 

form) and services exist and are appropriate for the users' needs; and then 

either refer the -user *to the proper source, or be able themselves to translate 

'tho.se needs into search ques-tions and conduct the search. 

Unfortunately, today '\..most of our educational institutions are- not turnr 

ing but professionals who are technically equipped to deal with non-print materials 

or with Qomputer and communication tqchnologies . " (Ref. No. 4^ p. jSl) It v^ould . 

1 

certainly be beneficial to the information community as a whole, for NCLIS to 
promote,^ foster and fund specialized courses and programs in the use of modern 
techniques for information- retrieval . 

Research . 

The commission recognizes the role that the Office of Science Information 
Service of the National Science Foundation (OSIS/NSF) plays as "...the principal 
component of government responsible for informati on science research." (Ref. No. 4, 
p. 85) OSIS/NSF has Certainly played a significant role in the data basq research 
and development (R § D) field. It has been responsible for R § D associated 
with design^ management, and use of highly sophisticated data bases such 'as the 

Chemical Abstracts Services* Registry System. It was responsible for development 

I ^ 

of methodologies for analysis, use and evaluation of a wide variety of data bases 
and data bas^ services through sponsoring the design and development of the 
university based centers at the University of Georgia, IIT Research Institute, 
University of Pittsburgh, UCLA, Ohio State University, Lehigh Univeristy, and . , 
Northeast Academic Science Information Centers. 
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Results of several OSIS/NSF current research grants will greatly impact 
on the data base field with respect to facilitating use of data bases and 

. resource sharing of data bases* through networks. MIT is investigating the 

feasibility of accessing multiple on-line- systems and data bases through a common 
query language and common vocabulary. (Ref. No. 6) BDUCOM is conducting a 
sophisticate^ garttiiTig and modeling study in order to investigate networking with ^ 
respect to the economic, administrativ^e and managerial aspects of various net- 
work configu^fations. (Ref. No. 7) ' • 

The university of Illinois is investigating the feasibility and utility of 
data base mapping via the^'interconnectipns (eomitioriality of data elements and 

\ subject content) and potential routes that exist from one .data base to another. 
The existertcp and Ipcation of da.ta base resources together with an indication of 
the sequence in which they should be accessed wiai be shown. (Ref. No. 8) This 
NSF sponsored program should help pave the way to meet the need expressed by 
NCLIS: "Much of the success of a nationwide program will depend on knowing what 
information is- available where, and how to gain access to it." CRef. Np. 4, p, 98) 
The System Development Corporation is studying t^e- impact of on-line search 
^j^rvices (Ref. No. 9) and Lockheed Missiles and Space Company, Inc. is conducting 
an experiment in providing on-line reference retrieval services to the general 
public through public libraries. (Ref. No. 10) 

NSF sponsored research at CAS has certainly resulted in some of the most 
significant developments, in the field of chemical information, e.g.,* the 
Registry System for identification of unique chemical compounds and the develop- 
ment of schemes for substructur-CL^arching, automatic naming of, compounds from 
structures, and automatic development of structures from names. OSIS/NSF has 
also sponsored work leading toward the reduction of duplicate efforts ^mong data 

I **** *^ " 
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base produqers (Ref. Nos . 11 12), the uSe of co'mnion or startdardized data element 
on machineZ-readable files, and,. ~in g^jneral, has ef fectivcl);^ encouraged coopcra- 
tion among data base generators and among centers that process data bases. The 
• conunission would like to see a strengthening of the OSIS/NSF research and 
developmibnt programs and this should be encouraged. The existence .of the techno- 
logical/feasibility of a national network and the advancement achieved in the 
state-qff-the-art of data bases would not be where they are today if it had not 
becn'flbr research and development sponsored by the OSIS and the Division of 
Computational Research of NSF. 

rConstituencieSx Served ^ . ' 



The use of machine-readable data bases relates to the Commission's objective 
to /*... provide adequate special services to special constituencies, including ^ 
tlxi unserved.*' (Ref. No. 4, p. ,55) There are efforts underway to develop new 
specialized data bases dealing with the problems of everyday ILfe such as 

>nsumer affairs-, legal aid, d.a/ care centers, recreational, health, and social 
services, etc. Such data bases may be developed for, and w^uld be used by, 
UeighborhoQd or .community information cerfters. In this Wc^y the depth of service 
to current users would' be expanded and services would- be, .expanded to include new 
clienteles and new constituencies. ' ^ 



Resource Locators 



One of the, chief obstacles to wide use of available data bases and to the 
sharing of resources is the lack of public knowledge about the existence and 
location of available resources whether they exist withir^tiie federal government, 



state governments or the private sector. 
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There are aids such as: author addresses provided in many primary journals 
• * 

and in ISI's Current Contents for facilitating the*readers ability to request^ 
reprints From' authors; simple article ordering services such as ISI's original 
article tear shee.t (OATS) service and the document ordering post cards bound in 
a number of journals requiring the requestor to merely circl^ the ID number 'of* 
desited items; and, more recently, the .provisiW of ordering from spe.cific data 
base producers via on-line search services. Thes« services are good but they 
cover only a limited number of resources. Other resource locators are needed. 

The National Program can be instrumental in promoting and fund^rng'the 
development of tools for locating resources^-data bases, seaxch services, iand 
back up documents. The location of data bases and search services will be 
greatly assisted by the use of tools currently being developed within the private C ^ 
sector by the not-for-prgfTt organizations. Several purveys of sectors of the 
data base community are underway via the American Society for Infoi^mation * ^ 
Science (ASIS), th.e Association of Scientific Information Dissemination Centers 
(ASIDIC), the National Federation for Abstracting and Indexing Services and at 
the University of Illinois (Ref. No. 8). Tlie results of these efforts could be 
made more jvidely available foi network Gse ^icf^the National Program. Similarly, 
but on a muph larger scale, there is a/need J-for^/a United States union list of 
'serials holdings on-line (as has be^^ done ;in Canada) for? location of hard-copy 
documents to complement data base searching and complete the information retrieval 
loop. The development of this data base related tool and t]ie network for 
accessing it would certainly be within the scope of the National Program. 
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T.clecommuni cations 

The continued and expanded use of machil^e-readable data bases is largely 
dependent on telecommunications and, as on-line use of data bases grows, the 
d|^pcnden(5e wilT increase. The existence of common carr:i.er communications net- 
works, such as TVMNHT , has-been instrumental in the development of on-line 
data base services, however, the cost associated with communications has-^al^^o- 
been a barrier to some information services. At present communication charges 
represent approximately 10-20% of the out-of-pocket on-line search charges. The 
nuir.ber, of course*, varie§ with the data base accessecj and the uj><.»rs location 
yith respect to the computer site where the data base is searched. If the 
coiiunission were to effect a lower tariff rate for irrformation transmission it 
would certainly promote increased remote use of. data bases, sharing of resources, 
and'^he use*of networks for^resource location. It would also promote the use 
of facsimile transmission for communication of information sucji as document 
requests or 'document delivery, as a result pf the data base searches. 
Communication costs have long been a barrier to ^the use of facsimile transmission . 

Although. there is today no national program or plan for^ networking and 
resource sharing f6r machine-readable data bases, there is in fact a nation wide 
network over which Jata bases are shared. Specifically I refer to the uise of 
the TYiMNCT communications network for searching data bases via remote terminals 
by many simultaneous users who are Icicated virtually everywhere throughout the 

4 ' ..." 

country' (in fact several non-U. S. countries use the same facilities). ' 

Communication' satellites are now operating in the U. S. and internationally. 

/ 

This important development enlarges the nation's capability for exchange of 
information in all forms and a nation wide information network as proposed by 

28 
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NCLIS, will need '\..to integrate teletype, audio, digital, and video signals 
into a single system/' (Re'f. No. 4, p. 82). , 

Resource Sharipg and Networking 

The number of data bases, size of data basGs and associated costs of 
operation provide the economical necessity for data base resource sharing. Any 
discussion regarding the need for data base services today and in the future 
necessarily involves a discussion of the reasons for^ and advantages of, data 
base resource sharing and networking. Although data base sharing can be 
effected in many, ways, the principal way in which shaMng takes place today is 
by remote accessing of Hata bases through communications networks. No data base 
processing center whether it exists in an academic, industrial, or governmental 
cr-gainsation or whether it functions through the computer center, information 
center, or library can afford to process and provide services from all of t\^^ 
available data bases. Data base generation is expensive and so. the co^ts of 
production, which are passed on to organizations that process data bases (for 
on-line or batch searching) , are substantial. The cost of establishing artu 
maintaining processing/searching activities is also high as it involves con- 
siderable' investment in: data base purchase/lease/licensing; data base royalty 
and access /ees-; materials and equipment; machine time; communications; and 
personnel, expenses. 'Additionally, the cost of preparing, negotiating, and^ 
conducting searches is high. ' 

The principal advantages of data base resource sharing and networking are: 
availability of "resources to a much larger community; reduced cost of data bas^ 
searches as a result of distributing fixed costs over a larger base; i/eduction ' 
of the number of skilled personnel needed for .processing data bases; accumulation 
of a wider variety of exipariences and *'know how*' in data base use; development 
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of impetus toward standardization of data base formats, element definitions, 
formats for search strategies, access procedures and protocols/ etc; and 
availability of more resources at a single location; and availability of data 
bases that individual user organizations would be unlikely to process internally 
because of low demand within the organization. 

While resource sharing is largely done viji communications netwoi'ks, and of 
on-line systems, other types of sharing exist. For example, centers that pro- 
cess data bases' themselves and provide services to clients (internal and/or 
external} often require services for their own clients frdnl data bases that are 
processed in other centers. In such cases, two centers may exchange services 
or sell services to each other. Centers that provide their own b^tch processing, 
SDI, or retrosearch services often function as middl^en in accessing on-line 
services for their clients. On the other hand, they_jnay function in' a referral 
capacity in directing clients to the appropriate source. 

Data Base Generation and Use in the Private Sector ' , 

NCLIS recognizes the necessity and advantages of accomodating, in the 
National^ Program, the wide range-. of resources and services within the private 
sector. They are an important part of the total information^ stjprHy system today 
and will continue to be in the future. 

One of the major areas where the National Program relates to 'data bases is 
the private sector which includes: the publishing ^industry; abstracting and 
indexing services, many of whom produce machine-readable data bases; the infor- 
mation industry and audio visual industry; and the special libraries in business 
and industry, many of whom are users of data base products and services. 

; 30 
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AdditionaJ. members of the private sector that affect data base activities - 
directly or indirectly are the manufacturers of computers, terminals, user 
communication equipment, and the operators of communications networks such as 
TYMNET. 

While access to government generated data J)ases such as ERIC , MEDLIME , 
CAIN , and NTIS is important, it is equally important that researchers and infor 
mation service seekers in general have access to the New York Times Information ' 
Bank , Chemi<!al Abstracts Condensates , The American Institute of Physics' SPIN , 
Engineering Index's COMPENDEX , PREDICASTS , and many others. Cocfrdinated access 
to all of these sources in both sectors is needed in order to provide the types 
and levels -of service required, by users. The economic viability of data bases 
in the private sector is obviously related to level qf use and rates charged for 
•selling, leasing, licensing, and accessing the data bases through second and 
third party use::* (processors and brokers). The private sector recognizes and 
appreciates the fact that either directly or indirectly 'they have benefited from 
the government's involvmfent in the data base field. The government has sub- 
sidized many R '"programs^ in the private sector associated with systems and 
products. It has also funded the initial planning and development of centers 
that process data bases and sell service to users. All of this, has been 
instrumental in bringing the data base ^'industry", to its current position. The 
private sector is mindful however, that in some instances the government has 
taken actions that may have a negative impact on the private sector by way of 
intervention, and competition. Just as no one wants to kill the goose that 
layed the golden egg, it also seems unreasonable for the goQse to kill its. 
of fspringt 
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The, data base service area is a nevt-^d^^nd 'unfortunately it is not always 

* 

the case that the implications of actions taken Unilaterally on the part of 
either sector are fully understood at the time they are taken, NCLIS should be 
mindful of the interfaces, interdependencies and separate responsibilities of 
the two sectors in developing the National Program. 



[)ata.,^Base Problems and I-utiljc Trends 

The major data base problems are not technical ones. They are legal, 
political and psychological ahd are associated with a lack of national leader- 
ship, cooperative resource , sharing, network arrangements, competition, marketing, 
copyright, standards, and f^ontinued economic viability. Hopefully, NCLIS will 
be instrumental in solving some of these problems. 

There are strong indications that in the future, we will see more data 
ba!|^, covering more subject areas, with more special purpose subset and merged 
data bases being develQped; the volume of data base use will increase and the 
'usdr clientele" served wiU represent more diverse constituencies; more data 
bases will be mad^ available on-line through networks and a larger share of the 
total data base use will be on-line; there wi\f^6e^ raore involvement of librarians 
in data base services and services will be made available through public 
librarians as Well as in the academic. and industrial organization's; the 
techniques of computational linguistics, autCT.atic content analysis, and pattern 
recognition will be employed on a* larger scale; there will be more emphasis on 
the man-machine interaction; and, systems will become easier to • use .through 
natural language communication. 
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